Online Generation of Locality Sensitive Hash Signatures

نویسندگان

  • Benjamin Van Durme
  • Ashwin Lall
چکیده

Motivated by the recent interest in streaming algorithms for processing large text collections, we revisit the work of Ravichandran et al. (2005) on using the Locality Sensitive Hash (LSH) method of Charikar (2002) to enable fast, approximate comparisons of vector cosine similarity. For the common case of feature updates being additive over a data stream, we show that LSH signatures can be maintained online, without additional approximation error, and with lower memory requirements than when using the standard offline technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unified Locality-Sensitive Signatures for Transactional Memory

Transactional Memory (TM) systems must record the memory locations read and written by concurrent transactions in order to detect conflicts. Some TM implementations use signatures for this purpose, which summarize read and write sets in bounded hardware at the cost of false positives due to address aliasing. Signatures are usually implemented as two separate (one for reads and another for write...

متن کامل

Efficient Online Locality Sensitive Hashing via Reservoir Counting

We describe a novel mechanism called Reservoir Counting for application in online Locality Sensitive Hashing. This technique allows for significant savings in the streaming setting, allowing for maintaining a larger number of signatures, or an increased level of approximation accuracy at a similar memory footprint.

متن کامل

Privacy Preserving Probabilistic Record Linkage Using Locality Sensitive Hashes

As part of increased efforts to provide precision medicine to patients, large clinical research networks (CRNs) are building regional and national collections of electronic health records (EHRs) and patientreported outcomes (PROs). To protect patient privacy, each data contributor to the CRN (for example, a health-care provider) uses anonymizing and encryption technology before publishing the d...

متن کامل

Online Learning of Binary Feature Indexing for Real-Time SLAM Relocalization

In this paper, we propose an indexing method for approximate nearest neighbor search of binary features. Being different from the popular Locality Sensitive Hashing (LSH), the proposed method construct the hash keys by an online learning process instead of pure randomness. In the learning process, the hash keys are constructed with the aim of obtaining uniform hash buckets and high collision ra...

متن کامل

Biometric Hashing Based on Genetic Selection and Its Application to On-Line Signatures

We present a general biometric hash generation scheme based on vector quantization of multiple feature subsets selected with genetic optimization. The quantization of subsets overcomes the dimensionality problem of other hash generation algorithms, while the feature selection step using an integer-coding genetic algorithm enables to exploit all the discriminative information found in large feat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010